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Abstract 

As one of the most important types of (weaker) supervised information in machine learning 
and pattern recognition, pairwise constraint, which specifies whether a pair of data points 
occur together, has recently received significant attention, especially the problem of pairwise 
constraint propagation. At least two reasons account for this trend: the first is that compared 
to the data label, pairwise constraints are more general and easily to collect, and the second 
is that since the available pairwise constraints are usually limited, the constraint propagation 
problem is thus important. 

This paper provides an up-to-date critical survey of pairwise constraint propagation re¬ 
search. There are two underlying motivations for us to write this survey paper: the first is to 
provide an up-to-date review of the existing literature, and the second is to offer some insights 
into the studies of pairwise constraint propagation. To provide a comprehensive survey, we 
not only categorize existing propagation techniques but also present detailed descriptions of 
representative methods within each category. 
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1. Introduction 


In constrained clustering tasks, people exploit the available prior knowledge to guide the 
clustering process. Examples of prior information for constrained clustering include relative 
comparisons IQ, pairwise constraints [2], and cluster sizes [3]. Prior knowledge on whether 
two objects belong to the same cluster or not are expressed respectively in terms of must- 
link constraints and cannot-link constraints. Generally, such pairwise constraints, unlike the 
class labels of data, do not provide explicit class information and are therefore considered a 
weaker form of supervisory information. In many situations, pairwise constraint relationships 


between data points are more readily availab 
the constrained clustering problem [2, 4, 5, 


e than the actual class label of the data. Besides 
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pairwise constraints have also been widely 


used for many other machine learning problems such as metric learning [7 


, IS, 


ldl . and it 


has been reported that the use of appropriate pairwise constraints can often lead to improved 
results. 

In the constrained clustering research, especially the constraine d sp ectral clustering, peo¬ 


ple exploit the pairwise constraints for spectral clustering [ll Ul2l ll3. 


14], which constructs 


a new low-dimensional data representation for clustering using the leading eigenvectors of 
the similarity matrix. Since pairwise constraints specify whether a pair of data points occur 
together, they provide a source of information about the data relationships, which can be 
readily used to adjust the similarities between data points for spectral clustering. In fact, con¬ 
strained spectral clustering has been extensively studied previously. For example, [15] (SL) 
trivially adjusted the similarities between data points to 1 and 0 for must-link and cannot-link 
constraints, respectively. This method only adjusts the similarities between constrained data 
points. 

However, in general, while it is possible to infer pairwise constraints from domain knowl¬ 
edge or user feedback, in practice, the availability of such constraints is scarce. Hence, one 
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line of research in constrained clustering aims to fully utilize the information inherent in the 
available pairwise constraints through constraint propagation. It should be noted that the 
problem of pairwise constraint propagation differs from that of label propagation and is more 
challenging in the following aspects: (a) unlike class labels for data, pairwise constraints in 
general do not provide explicit class information; (b) it is in general not possible to infer class 
label directly from the pairwise constraints which simply state whether a pair of data belong 
to the same class or not; (c) for a dataset of size n, there are potentially Of/? 2 ) pairwise con¬ 
straints that can be inferred through constraint propagation, while there are only 0(n ) class 
labels that need to be inferred for the data in label propagation. 


2. Pairwise constraint propagation methods 


Different from [15], the pairwise constraint propagation studies how to propagate the lim¬ 


ited available pairwise constraints from the constrained data points to the unconstrained data. 
For example, [16|] (AP) propagated pairwise constraints to other similarities between uncon¬ 
strained data points using Gaussian process. However, as noted in [16], this method makes 
certain assumptions for constraint propagation specially with respect to two-class problems, 
although a time-consuming heuristic approach for multi-class problems is also discussed. 
Furthermore, such constraint propagation is also formulated as a semi-definite programming 


(SDP) problem in [17] (SDP). Although the method is not limited to two-class problems, it 


incurs extremely large computational cost for solving the SDP problem. In [18], the pairwise 


constraint propagation is also formulated as a constrained optimization problem, but only 
must-link constraints can be used for optimization. 

To overcome these problems in pairwise constraint propagation, an exhaustive and ef- 


20], which is not limited to two-class 


ficient constraint propagation approach (E 2 CP) 119 . 

ems or using only must-link constraints, has been proposed recently. Specifically, in 


prob 
119 , 


201] the challenging constraint propagation problem is decomposed into a set of in- 
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dependent semi-supervised learning | |2lL [22, 23] subproblems. Through formulating these 
subproblems uniformly as minimizing a regularized energy functional based on Laplacian 


regularization [22[ 


24], in [20, 


25], it has been shown that the pairwise constraint propagation 


can be further transformed into solving a continuous-time Lyapunov equation [26] which oc¬ 
curs in many branches of Control Theory such as optimal control and stability analysis [27] • 
Considering that directly solving the Lyapunov equation scales polynomially to the data size, 


1191 


20] further develop an approximate but efficient algorithm based on ^-nearest neighbor 


(fc-NN) graphs using label propagation introduced in [2lJ]. Since the time complexity of the 
E 2 CP algorithm is quadratic with respective to the data size N and proportional to the total 
number of all possible pairwise constraints (i.e. N(N - l)/2), it can be considered compu¬ 
tationally efficient. As compared to the SDP-based constraint propagation [170 with a time 
complexity of 0(N 4 ), the E 2 CP algorithm is noted to incur much less time cost. Finally, the 
resulting exhaustive set of propagated pairwise constraints through the exhaustive and effi¬ 
cient constraint propagation (E 2 CP) can be used to adjust the similarity matrix for spectral 
clustering. Although the E 2 CP constraint propagation and similarity adjustment approaches 
are proposed in the context of constrained spectral clustering based on pairwise constraint 
propagation, they can be readily applied to many other machine learning problems initially 
provided with pairwise constraints. 

The methods mentioned previously can only be applied to data within a single modality 
or presented in a single representation. In reality, there are many datasets with multiple 
modalities or multiple representations. For example, images found on the Web can either 
be described by their visual characteristics or by the surrounding texts. Another example 
is the images from photo sharing websites, such as Flickr. These images can typically be 
represented using two separate modalities, based respectively on the visual features and the 
user-provided textual tags. Thus, the problem of pairwise constraint propagation for the 


multi-modal data has received more and more attention [28 


22 , 


30 


31D. 
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In [28], multiple graphs are constructed for the multi-modal data, with one graph for each 


of the modalities. Then a random walk process on these graphs is defined. The transition 
probabilities among the nodes on multiple graphs can be computed using the random walk 
process. According to such transition probabilities, a series of independent multigraph based 
constraint propagation subproblems can be formulated. A graph regularization framework is 
applied to solve these subproblems through a series of quadratic optimizations. Furthermore, 
it is shown the set of constraint propagation subproblems can be unified and solved as a single 
quadratic optimization problem and the multi-modal constraint propagation has a closed-form 
solution. 


It should be noted that [28] have actually ignored the concept of heterogeneous pairwise 


constraints or the strategy of heterogeneous constraint propagation. In [28], it is assumed 
that the constraint settings are consistent among different data modality. In contrast, [ 2911 
considers the heterogeneous constraint propagation problem, in which the constraint set¬ 
tings of different data modalities may not be necessarily consistent. Q is motivated by 
the homogeneous pairwise constraint propagation method [19], i.e., the heterogeneous con¬ 
straint propagation problem can be decomposed into a set of independent semi-supervised 
learning subproblems which can then be efficiently solved by graph-based label propagation 
[21]. More importantly, [29] further develop a constrained sparse representation method for 
graph construction over each modality using the homogeneous pairwise constraints. That 


is, different from [190 (the homogeneous method), [290 can exploit both heterogeneous and 


homogeneous pairwise constraints. 


In [31], a unified framework for intra-view and inter-view constraint propagation on 


multi-view data has been proposed. Although both intra-view and inter-view constraint prop¬ 
agation are crucial for multi-view tasks, most previous methods cannot handle them simul¬ 
taneously. To address this challenging issue, [3_1[] propose to decompose these two types 
of constraint propagation into semi-supervised learning subproblems so that they can be uni- 
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formly solved based on the traditional label propagation techniques. To further integrate them 
into a unified framework, [31] utilize the results of intra-view constraint propagation to adjust 
the similarity matrix of each view and then perform inter-view constraint propagation with 
the adjusted similarity matrices. 


3. Conclusions 

In summary, pairwise constraint propagation has been proven to be an effective way to 
exploit the limited number of pairwise constraints. Many pairwise constraint propagation 
methods have been proposed for both single-view and multi-view data. These methods have 
been reported to achieve promising results in different challenging tasks in pattern recogni¬ 
tion, computer vision, and multimedia content analysis. 


Acknowledgements 

This work was supported by National Natural Science Foundation of China under Grants 
61202231 and 61222307, National Key Basic Research Program (973 Program) of China un¬ 
der Grant 2014CB340403, Beijing Natural Science Foundation of China under Grant 4132037, 
Ph.D. Programs Foundation of MOE of China under Grant 20120001120130, the Fundamen¬ 
tal Research Funds for the Central Universities and the Research Funds of Renmin University 
of China under Grant 14XNLF04, and the IBM Faculty Award. 

References 

[1] A. Frome, Y. Singer, F. Sha, J. Malik, Learning globally-consistent local distance func¬ 
tions for shape-based image retrieval and classification, in: Computer Vision, 2007. 
ICCV 2007. IEEE 11th International Conference on, IEEE, 2007, pp. 1-8. 


6 



[2] K. Wagstaff, C. Cardie, S. Rogers, S. Schrodl, et al., Constrained k-means clustering 
with background knowledge, in: ICML, Vol. 1, 2001, pp. 577-584. 

[3] L. Xu, W. Li, D. Schuurmans, Fast normalized cut with linear constraints, in: Computer 
Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, IEEE, 2009, 
pp. 2866-2873. 

[4] D. Klein, S. D. Kamvar, C. D. Manning, From instance-level constraints to space-level 
constraints: Making the most of prior knowledge in data clustering. 

[5] S. Basu, M. Bilenko, R. J. Mooney, A probabilistic framework for semi-supervised 
clustering, in: Proceedings of the tenth ACM SIGKDD international conference on 
Knowledge discovery and data mining, ACM, 2004, pp. 59-68. 

[6] B. Kulis, S. Basu, I. Dhillon, R. Mooney, Semi-supervised graph clustering: a kernel 
approach, Machine learning 74 (1) (2009) 1-22. 

[7] E. P. Xing, M. I. Jordan, S. Russell, A. Y. Ng, Distance metric learning with applica¬ 
tion to clustering with side-information, in: Advances in neural information processing 
systems, 2002, pp. 505-512. 

[8] S. C. Hoi, W. Liu, M. R. Lyu, W.-Y. Ma, Learning distance metrics with contextual 
constraints for image retrieval, in: Computer Vision and Pattern Recognition, 2006 
IEEE Computer Society Conference on, Vol. 2, IEEE, 2006, pp. 2072-2078. 

[9] W. Liu, S. Ma, D. Tao, J. Liu, P. Liu, Semi-supervised sparse metric learning using 
alternating linearization optimization, in: Proceedings of the 16th ACM SIGKDD inter¬ 
national conference on Knowledge discovery and data mining, ACM, 2010, pp. 1139— 
1148. 


7 



[10] Z. Fu, Z. Lu, H. H. Ip, H. Lu, Y. Wang, Local similarity learning for pairwise constraint 
propagation. Multimedia Tools and Applications (2014) 1-20. 

[11] A. Y. Ng, M. I. Jordan, Y. Weiss, et al.. On spectral clustering: Analysis and an algo¬ 
rithm, Advances in neural information processing systems 2 (2002) 849-856. 

[12] U. Von Luxburg, A tutorial on spectral clustering, Statistics and computing 17 (4) (2007) 
395-416. 

[13] J. Shi, J. Malik, Normalized cuts and image segmentation, Pattern Analysis and Ma¬ 
chine Intelligence, IEEE Transactions on 22 (8) (2000) 888-905. 

[14] O. Veksler, Star shape prior for graph-cut image segmentation, in: Computer Vision- 
ECCV 2008, Springer, 2008, pp. 454-467. 

[15] K. Kamvar, S. Sepandar, K. Klein, D. Dan, M. Manning, C. Christopher, Spectral 
learning, in: International Joint Conference of Artificial Intelligence, Stanford InfoLab, 
2003. 

[16] Z. Lu, M. A. Carreira-Perpinan, Constrained spectral clustering through affinity propa¬ 
gation, in: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Confer¬ 
ence on, IEEE, 2008, pp. 1-8. 

[17] Z. Li, J. Liu, X. Tang, Pairwise constraint propagation by semi definite programming for 
semi-supervised classification, in: Proceedings of the 25th international conference on 
Machine learning, ACM, 2008, pp. 576-583. 

[18] S. X. Yu, J. Shi, Segmentation given partial grouping constraints, Pattern Analysis and 
Machine Intelligence, IEEE Transactions on 26 (2) (2004) 173-183. 


8 



[19] Z. Lu, H. H. Ip, Constrained spectral clustering via exhaustive and efficient constraint 
propagation, in: Computer Vision-ECCV 2010, Springer Berlin Heidelberg, 2010, pp. 
1-14. 

[20] Z. Lu, Y. Peng, Exhaustive and efficient constraint propagation: A graph-based learning 
approach and its applications, International journal of computer vision 103 (3) (2013) 
306-325. 

[21] D. Zhou, O. Bousquet, T. N. Lai, J. Weston, B. Scholkopf, Learning with local and 
global consistency, Advances in neural information processing systems 16(16) (2004) 
321-328. 

[22] X. Zhu, Z. Ghahramani, J. Lafferty, et al., Semi-supervised learning using gaussian 
fields and harmonic functions, in: ICML, Vol. 3, 2003, pp. 912-919. 

[23] Z. Lu, H. H.-S. Ip, Image categorization by learning with context and consistency, in: 
Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 
IEEE, 2009, pp. 2719-2726. 

[24] Z. Lu, Y. Peng, Latent semantic learning by efficient sparse coding with hypergraph 
regularization, in: AAAI, 2011. 

[25] Z. Fu, Z. Lu, H. H.-S. Ip, Y. Peng, H. Lu, Symmetric graph regularized constraint 
propagation., in: AAAI, 2011. 

[26] R. H. Bartels, G. Stewart, Solution of the matrix equation ax+ xb= c [f4], Communica¬ 
tions of the ACM 15 (9) (1972) 820-826. 

[27] Z. Gajic, M. T. J. Qureshi, Lyapunov matrix equation in system stability and control, 
Courier Corporation, 2008. 


9 



[28] Z. Fu, H. H. Ip, H. Lu, Z. Lu, Multi-modal constraint propagation for heterogeneous 
image clustering, in: Proceedings of the 19th ACM international conference on Multi- 
media, ACM, 2011, pp. 143-152. 

[29] Z. Lu, Y. Peng, Heterogeneous constraint propagation with constrained sparse represen¬ 
tation, in: Proceedings of the 2012 IEEE 12th International Conference on Data Mining, 
IEEE Computer Society, 2012, pp. 1002-1007. 

[30] Z. Fu, H. Lu, H. H. Ip, Z. Lu, Modalities consensus for multi-modal constraint propaga¬ 
tion, in: Proceedings of the 20th ACM international conference on Multimedia, ACM, 
2012, pp. 773-776. 

[31] Z. Lu, Y. Peng, Unified constraint propagation on multi-view data., in: AAAI, 2013. 


10 



